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Abstract 


Two  Monte  Carlo  studies  explore  the  relation  of  the  tau  measure  of  inter- 
session response  variability  and  the  stress  of  the  corresponding  multidimensional 
scaling  solution,  thereby  providing  a statistical  basis  for  evaluating  the  goodness 
of-fit  of  a spatial  configuration.  In  the  first  Monte  Carlo  study,  the  stress  and 
tau  of  10-,  16-,  and  30-point  configurations  in  1,  2,  3,  and  4 dimensions  are 
shown  to  be  linear  functions  of  the  internal  error  level.  In  the  second  study, 
these  relations  are  shown  to  be  relatively  invariant  with  respect  to  the  particular 
configurations.  Three  methods  are  proposed  for  establishing  acceptable  levels 
of  stress  for  heuristic  and  for  constrained  multidimensional  scaling. 
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Statistical  guidelines  for  evaluating  stress  are  essential  for  the 
successful  application  of  multidimensional  scaling.  For  this  reason,  Monte 
Carlo  studies  have  been  published  establishing  correspondences  between  internal 
error  levels  and  stress  values  (Young,  1970;  Sherman,  1972;  Spence  & Graef,  1974; 
Cohen  & Jones,  1974).  Unfortunately,  all  previous  investigations  have  short- 
comings (see  Arabie,  1973):  1)  inflated  stress  values  due  to  local-minimum  prob- 

lems, 2)  scaling  of  interpoint  distances  plus  noise  or  random  rank  order  dissimi- 
larities does  not  guarantee  recovery  of  the  interpretation  of  the  original  configura- 
tion, 3)  currently  there  is  no  way  to  independently  estimate  the  error  level  to 
the  distribution  of  stresses,  and  4)  all  previous  Monte  Carlo  studies  concentrate 
only  on  local-minimum  solutions,  but  scaling  with  constraints  (Noma  & Johnson,  1977) 
often  produces  suboptimal -stress  solutions.  In  this  paper  a method  around  these 
complications  is  proposed. 

In  section  2,  it  is  argued  that  the  latent  configuration  is  best  assumed 
equivalent  to  the  scaled  configuration.  This  assumption  avoids  both  inflated 
stresses  due  to  recovery  of  suboptimal  solutions  and  the  recovery  of  non- 
representative solutions.  The  Monte  Carlo  methodology  relating  error  to  stress 
and  error  to  intersession  variability  is  introduced  in  section  3.  In  section  4, 
the  results  of  two  Monte  Carlo  studies  are  presented.  Ways  of  applying  intersession 
variability  to  evaluate  stress  appear  in  section  5. 

2.  The  Latent  Configuration 

The  multidimensional  scaling  methodology  has  been  applied  in  two  ways: 

1)  heuristic  standard  multidimensional  scaling  searches  for  structures  in  the  data; 

2)  constrained  multidimensional  scaling  with  constraints  emphasizes  hypothesis 
testing.  When  multidimensional  scaling  is  used  as  a heuristic  tool,  it  is 
customarily  assumed  that  the  algorithm  constructs  a configuration  that  approximates 
a latent  or  "true"  configuration.  Also,  only  one  scaled  configuration  is  of 
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interest:  the  local-minimum  solution.  Scaling  with  constraints,  however, 
produces  configurations  that  are  often  suboptimal  in  terms  of  stress  level. 

In  addition,  from  a single  dissimilarity  set,  many  different  configurations  are 
produced  by  varying  the  constraints  placed  on  interpoint  distances  (Borg  & Lingoes, 
1978),  point  coordinates  (Bentler  & Weeks,  1978;  Bloxom,  1978),  or  order  of  point 
coordinates  (Noma  & Johnson,  1977).  Each  configuration  may  also  have  a stress 
comparable  to  that  of  the  local-minimum  solution  yet  illuminate  a different  struc- 
ture in  the  data.  This  means  that  potentially  many  configurations  could  be 
representative  of  structure  in  the  data.  Since  any  one  of  these  configurations, 
or  none  of  them,  may  be  the  latent  configuration,  the  latent  configuration  is 
best  defined  as  the  configuration  produced  by  the  scaling  algorithm.  This 
simplifying  assumption  also  allows  the  separation  of  the  recovery  of  the  original 
structure  and  the  production  of  the  lowest  attainable  stress  level.  That  is, 
by  dictating  that  the  structure  is  perfectly  recovered,  the  stress  may  be  examined 
alone.  Also  there  is  no  possibility  of  suboptimal  stress  for  a given  dissimilarity 
set. 

3.  Methodology 

By  equating  the  latent  and  scaled  configurations,  the  question  is,  given 
a configuration  (C),  how  much  noise  must  be  added  to  the  interpoint  distances 
(D)  to  produce  a given  stress  (S^).  That  is,  a matrix  of  interpoint  distances 
plus  noise  (denoted  by  D ) is  computed  for  a given  configuration.  The  matrix 
and  the  given  configuration  are  input  to  a scaling  program  which  computes  a 
stesss  value  after  zero  iterations. 

To  generate  the  distance  plus  noise  matrix  the  procedure  described  by 


Sherman  (1972;  Hefner,  1958;  Ramsay,  1969)  is  used.  Briefly,  the  procedure 
may  be  summarized  as  follows:  1)  After  specifying  the  number  ofpoints  (N)  and 
dimensionality  (d),  a configuration  is  randomly  generated  in  a d-dimensional 
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unit  hypercube.  2)  A number  called  the  level  of  noise  is  computed  by  multiplying 

2 

a specified  error  level  (E),  times  the  variance  of  the  N*d  coordinates  (ac). 

3)  The  elements  of  the  dissimilarity  matrix  are  generated  by  adding  noise  to  the 
Euclidean  distance  between  all  N( N- 1 )/ 2 pairs  of  points: 


du  - l 

k=l 


(xik  " xjk  + eijk)< 


2 2. 


where  is  a random  variable  distributed  as  N(0,2ocE  ).  4)  From  these  dissimi- 

larities, the  stress  of  the  latent  configuration  is  computed: 


S1  = f(C,Dz) 

For  a given  number  of  points,  dimensionality,  and  configuration,  many  simulated 
dissimilarity  sets  at  a given  error  level  will  map  out  a distribution  of  stress 
values . 

One  measure  of  noise  in  the  data  to  be  scaled  is  the  intersession  variability. 
Due  to  the  assumed  ordinal  nature  of  the  input  to  the  multidimensional  scaling 
algorithm,  the  tau  statistic  (Kendall,  1962)  is  used  as  the  measure  of  the  correla- 
tion between  dissimilarity  sets  from  one  session  to  another.  By  averaging  taus 
from  all  pairs  of  dissimilarity  sets  one  can  derive  the  expected  error  level. 

For  instance,  intersession  taus  near  unity  imply  that  the  error  level  is  low  so 
only  configurations  with  near-zero  stresses  are  acceptable.  Configurations  with 
stresses  outside  acceptable  error  bounds  are  considered  inadequate  representations 
of  the  data. 

4.  Resul ts 

Two  Monte  Carlo  studies  were  done.  The  first  characterizes  the  relationship 


of  error  level  to  mean  stress  and  mean  tau  for  specific  configurations.  The 
second  determines  the  sensitivity  of  the  error-stress  and  error- tau  relationships 
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to  different  configurations. 

In  the  first,  arbitrary  configurations  were  chosen  with  10,  16,  and  30 
points  in  1,  2,  3,  and  4 dimensions.  For  each  of  the  12  configurations,  five 
dissimilarity  sets  were  generated  at  error  levels  increasing  from  E = .025  by 
steps  of  .025.  Figures  1 and  2 show  typical  error-tau  and  error-stress  relation- 
ships from  E = .025  to  E = 1.5.  Note  that  the  functions  are  nearly  linear  up 
to  about  E = .750  before  reaching  asymptotes  at  stress  = 45%  and  tau  = 0. 

These  relationships  seem  to  typify  all  curves  produced  since  all  regressions 
using  E values  in  the  range  .025  to  .500  had  correlations  in  excess  of  .94.  Since 
scaling  solutions  would  be  excluded  from  further  analysis  with  stress  over  45% 
or  intersession  tau  near  0,  all  further  analysis  was  done  for  error  levels  from 
.025  to  .5. 


Figures  1 and  2 about  here 

The  second  Monte  Carlo  study  explores  the  relation  of  the  slopes  of  the 
error-stress  and  error-tau  functions  to  the  number  of  points,  dimensionality, 
and  specific  configurations.  Fifty  dissimilarity  sets  were  generated  at  each 
combination  of  5 different  random  configurations  at  N=10,  16,  30,  d=l , 2,  3,  and 
3 error  levels  (the  error  levels  were  picked  to  produce  mean  taus  of  approximately 
.5,  .75,  and  .9  as  predicted  by  the  regression  coefficients  obtained  in  the  first 
Monte  Carlo  study).  For  each  of  the  135  configurations,  (5  x 3 x 3 x 3),  means 
and  variances  of  the  stress  distributions  were  computed  for  the  set  of  50 
dissimilarities.  To  save  computation,  taus  were  computed  only  between  the  first 
15  of  the  50  dissimilarity  sets.  Means  and  variances  of  these  105  values, 

(15(15  - 1 )/ 2 ) , were  computed  for  each  configuration. 

In  accordance  with  the  results  of  the  first  Monte  Carlo  study,  for  each 
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fixed  level  of  N and  d,  the  error-mean  stress  and  error-mean  tau  correlations 
were  all  in  excess  of  .98.  However,  the  27  analyses  of  variance  of  the  stress 
distributions  across  the  five  configurations  for  a given  N,  d,  and  error  level 
were  all  significant  at  p < 05.  Therefore  it  must  be  concluded  that  the  two 
functions,  the  one  relating  error  and  mean  stress  and  the  one  relating  error  and 
mean  tau,  are  only  relatively  invariant  with  respect  to  specific  configurations. 
Two  other  issues  are  of  interest:  the  mean  tau-to-mean  stress  relation  and  the 
variance  of  the  taus  and  stresses  for  a given  level  of  tau  and  stress.  To 

li  C f v*p  e C 

estimate  the  slope  of  the  tau-stress  function,  r ~ was  computed  for  each 

"y  tau 

of  the  135  configurations.  Figure  3 shows  the  geometric  means  of  these  values 

for  all  combinations  of  N and  d.  The  log  ( y Stress  ) values  are  then  regressed  on 

tau 

log  (N)  and  log  (d)  yielding  the  following  equation: 


(!)  ^Stress  _ e~i . 5 301  |\j.  25377  4“.  25498  r = 729 

y tau 

Similarly,  the  standard  deviation- to-mean  ratios  of  tau  and  stress  (see  Figures 
4 and  5)  were  computed  as  these  equations: 

(2)  °tau  = e-66718  N-i.i?i  3 r = .970 

~u  tau 


(3)  °Stress  _ ei  .1  278  n“i  . 31  2 3 r 3 954 

y Stress 


Figures  3,  4,  and  5 about  here 
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5.  Discussion 

Previous  techniques  for  statistically  evaluating  stress  are  inadequate  for 
a variety  of  reasons.  Fixed  criteria  (Kruskal,  1964)  are  affected  by  the  number 
of  points  and  the  dimensionality.  "Looking  for  the  elbow"  requires  the  existence 
of  such  an  elbow.  Other  Monte  Carlo  studies  have  the  shortcomings  of  inflated 
stresses  due  to  local-minimum  problems  (Arabie,  1973;  Spence,  1974).  Evaluating 
the  output  of  constrained  multidimensional  scaling  programs  is  even  more  difficult 
since  the  scaled  configurations  are  usually  not  a local-minimum  solution.  There- 
fore all  previous  Monte  Carlo  studies  are  inappropriate  since  they  deal  only  with 
local-minimum  solutions.  Attempts  to  extend  the  Monte  Carlo  results  by  counting 
the  number  and  type  of  constraints  also  appear  inadequate  (see  Noma  & Johnson,  1977). 
In  this  section,  three  different  methods  are  proposed  for  establishing  acceptable 
bounds  on  stress  in  heuristic  multidimensional  scaling.  The  first  two  are  also 
applicable  to  constrained  multidimensional  scaling. 

All  three  methods  are  based  on  comparisons  of  mean  stress  and  mean  tau.  To 
compute  mean  stress  (S-j),  a single  configuration  (C1)  is  produced  using  some  average 
of  responses  over  replications  in  a two-way  analysis  or  a group  space  from  a three- 
way  analysis  (e.g.  INDSCAL  - Carroll  & Wish,  1974).  The  stresses  are  then  computed 
for  each  of  the  r replications  with  the  same  configuration: 

- f(C',  D.)  i = 1 , . . . ,r 

and  the  mean  stress  is  computed.  Mean  tau  (7)  is  computed  by  averaging  the 
taus  for  all  pairs  of  replications: 

- (D^ , D j ) i = 1,  ...,  r j - l,...,r  i j 

I r.  '•.hod  one,  the  tau  predicts  the  mean  stress  (^ ) using  either  equation  1 

or  the  appropriate  ratio  of  u Stress  in  Figure  3.  The  empirical  mean  stress 

1_lJtau 
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(S-j)  must  fall  within  specified  confidence  bounds  of  S for  the  configuration 
to  be  acceptable. 

In  method  two,  varying  amounts  of  error  are  added  to  the  interpoint  distances 
(DjJ  of  the  scaled  configuration  (C* ) to  determine  an  error  stress  curve.  Assuming 
this  curve  is  linear  within  a reasonable  range  of  error  values,  the  error  value 
(E)  for  the  empirical  stress  (S’  ) is  derived  from  the  regression  equation.  The 
range  of  compatible  taus  is  then  easily  computed  using  the  formula  (see  Figure  6): 

t = -1.2725  E + 1 r = .987 

and  the  variance  of  t is  found  by  using  equation  (2). 


Figure  6 about  here 

The  third  method  can  be  applied  only  to  scaled  local-minimum  solutions. 

In  contrast  to  the  first  two  methods,  no  assumptions  are  made  as  to  the  relation- 
ship between  the  latent  and  the  scaled  configurations.  One  only  assumes  that  a 
latent  configuration  exists.  Previous  Monte  Carlo  studies  (e.g.  Sherman,  1972) 
are  first  used  to  estimate  the  error  level  (E)  given  the  mean  stress  (S^).  This 
error  level  is  then  used,  as  in  method  two,  to  determine  a range  of  acceptable 
taus. 

In  all  three  methods,  stresses  that  are  too  high  indicate  an  inadequate 
configuration.  In  this  case,  an  attempt  should  be  made  to  scale  the  configuration 
in  a higher  dimensional  space.  Stresses  that  are  too  low  indicate  a fit  that  is 
too  good  and  the  scaling  should  be  done  in  a lower  dimensional  space  or  with  con- 
straints. 
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Footnote 


^This  research  was  supported  by  the  Office  of  Naval  Research,  Department 
of  Defense,  under  Contract  No.  N001 4-76-0648  with  the  Human  Performance  Center, 
Department  of  Psychology,  University  of  Michigan.  The  author  was  supported  by 
a training  grant  from  NIGMS  ( GM-01 231 ) to  the  University  of  Michigan. 
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Figure  Captions 

1.  Simulated  taus  for  an  arbitrary  16  point  configuration  in  two  dimensions. 

2.  Simulated  stresses  for  an  arbitrary  16  point  configuration  in  two  dimensions. 

3.  Mean  slope  of  the  stress-tau  relationship  for  the  mean  stress  and  tau  values 
of  the  135  random  configurations.  Lines  describe  the  best  fitting  log-linear 
function  of  N and  d (see  text). 

4.  The  standard  deviation  of  the  tau  distribution  as  a function  of  the  mean  tau 
of  the  135  random  configurations. 

5.  The  standard  deviation  of  the  stress  distribution  as  a function  of  the  mean 
stress  of  the  135  random  configurations. 

6.  Mean  tau  as  a function  of  the  error  level  for  the  135  random  configurations. 
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